Extracting semantic clusters from the alignment of definitions

نویسندگان

  • Gerardo Sierra
  • John McNaught
چکیده

Through tile alignment of definitions fronl two or more dilTerent sources, it is possible to retrieve pairs of words that can be used indistinguishably in the same sentence without changing tile meaning of the concept. As lexicographic work exploits common defining schemes, such as genus and dilTerentia, a concept is simihu'ly defined by different dictionaries. The dilTerence in words used between two lexicographic sources lets us extend lhe lexical knowledge base, so that clustering is available through merging two or more dictionaries into a single database and then using an approlwiate alignment techlaique. Since aligmnent starts from thc same entry of two dictionaries, clustering is l~lster than any other technique. Tile algorithm introduced here is analogybased, and starts from calculating the Levenshtein distance, which is a variation o1' the edit distance, and allows us to align the definitions. As a measure of similarity, the concept el' longest collocation couple is introduced, which is the basis of clustering similar words. The process iterates, replacing similar pairs of words in tile definitions until no new clusters are found.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method

In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

Small-world Structure in Children’s Featured Semantic Networks

Background: Knowing the development pattern of children’s language is applicable in developmental psychology. Network models of language are helpful for the identification of these patterns.  Objectives: We examined the small-world properties of featured semantic networks of developing children. Materials & Methods: In this longitudinal study, the featured semantic networks of children aged 1...

متن کامل

Analogy-based Method for Semantic Clustering

An analogy-based clustering method is proposed, through the alignment of definitions from two different sources. The method relies on the assumption that two authors use different words to express a definition. The algorithm introduced here is analogy-based, and starts from calculating the Levenshtein distance, which is a variation of the edit distance, and allows us to align the definitions. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000